The Mathematics of Statistical Machine Translation: Parameter Estimation

نویسندگان

  • Peter F. Brown
  • Stephen Della Pietra
  • Vincent J. Della Pietra
  • Robert L. Mercer
چکیده

We describe a series o,f five statistical models o,f the translation process and give algorithms,for estimating the parameters o,f these models given a set o,f pairs o,f sentences that are translations o,f one another. We define a concept o,f word-by-word alignment between such pairs o,f sentences. For any given pair of such sentences each o,f our models assigns a probability to each of the possible word-by-word alignments. We give an algorithm for seeking the most probable o,f these alignments. Although the algorithm is suboptimal, the alignment thus obtained accounts well for the word-by-word relationships in the pair o,f sentences. We have a great deal o,f data in French and English from the proceedings o,f the Canadian Parliament. Accordingly, we have restricted our work to these two languages; but we,feel that because our algorithms have minimal linguistic content they would work well on other pairs o,f languages. We also ,feel, again because of the minimal linguistic content o,f our algorithms, that it is reasonable to argue that word-by-word alignments are inherent in any sufficiently large bilingual corpus.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Statistical Machine Translation : Robust parameter estimation from noisy corpus

In this report, we describe our study of effect of noise on parameter estimation for statistical machine translation. So far, no study has been done on this topic, even though the algorithm used for parameter estimation for statistical machine translation (the EM algorithm) is known to be highly sensitive to noise. We present in detail the experiments performed to observe the influence of noise...

متن کامل

An English-Assamese Machine Translation System

Al-Onaizan,Y. et-al, "Distortion models for statistical machine translation" , In Proceedings of ACL-COLING, July 2006,pp. 529 . . 536. Birch, A. et-al, "Constraining the phrase-based,joint probability statistical translation model", In Proceedings of HLTNAACL Workshop on Statistical Machine Translation, April 2006, pp 154 . . 157. Brown, P. F. et-al, "The mathematics o...

متن کامل

Finite-state transducer-based statistical machine translation using joint probabilities

In this paper, we present our system for statistical machine translation that is based on weighted finite-state transducers. We describe the construction of the transducer, the estimation of the weights, acquisition of phrases (locally ordered tokens) and the mechanism we use for global reordering. We also present a novel approach to machine translation that uses a maximum entropy model for par...

متن کامل

Transductive Minimum Error Rate Training for Statistical Machine Translation

This paper investigates parameter adaptation in Statistical Machine Translation(SMT). To overcome the parameter bias-estimation problem with Minimum Error Rate Training(MERT), we extend it under a transductive learning framework, by iteratively re-estimating the parameters using both development and test data, in which the translation hypotheses of the test data are used as pseudo references. F...

متن کامل

A new model for persian multi-part words edition based on statistical machine translation

Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computational Linguistics

دوره 19  شماره 

صفحات  -

تاریخ انتشار 1993